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Sequence Analysis of the 3' End of the Feline Coronavirus FIPV 79-1146 Genome: 
Comparison with the Genome of Porcine Coronavirus TGEV Reveals Large Insertions 

RAOUL J. De GROOT,' ARNO 0. ANDEWEG, MARIAN C. HORZINEK, and WILLY J. M. SPAAN 


Institute of Virology, Veterinary Faculty, State University of Utrecht, Yalelaan 1, 3508 TD Utrecht, The Netherlands 

Received May 5, 1988; accepted July 29, 1988 

The genetic information, carried on mRNA 6 of feline infectious peritonitis virus (FIPV) strain 79-1146, was deter¬ 
mined by sequence analysis of cDNA clones derived from the 3' end of the FIPV genome. Two ORFs were found, 
encoding polypeptides of 11K (ORF-1) and 22K (ORF-2). The FIPV sequence was compared to the 3' end sequence of 
transmissible gastroenteritis virus (TGEV). ORF-1 has a homologous counterpart (ORF-X3) in the TGEV genome; both 
ORFs are located at the same position relative to the nucleocapsid gene. However, as a result of an in-frame insertion 
or deletion, ORF-1 is 69 nucleotides larger than ORF-X3. A similar event has occurred immediately downstream of 
ORF1: a 624-nucleotide segment, containing the complete ORF-2, is absent in the TGEV sequence. Most sequence 
similarity (98.5%) was found in the 3' noncoding sequences. ORF-X3 and ORF-1 are preceded by the sequence AAC- 
TAAAC, which is assumed to be the transcription-initiation signal in FIPV and TGEV (P. A. Kapke and D. A. Brian (1986) 
Virology 151, 41-49). By SI nuclease analysis, the 5' end of FIPV RNA 6 was mapped immediately upstream of this 
sequence. A 700-nucleotide TGEV-specific RNA was found by cross-hybridization with an FIPV 3'end probe, suggesting 
that TGEV ORF-X3 is also carried on a separate mRNA. The differences at the 3' ends of the FIPV and TGEV genomes 
may be the result of RNA recombination events. © 1988 Academic Press, ino. 


INTRODUCTION 

Coronaviruses, a group of enveloped, positive- 
stranded RNA viruses, have attracted considerable in¬ 
terest because of their unusual replication strategy. In 
the infected cell, there are five to seven subgenomic 
mRNAs which form a 3' coterminal nested set: they 
have common 3' ends but extend for different lengths 
in the 5' direction. In addition, the RNAs share a short 
5' leader sequence, which is fused to the RNA "body” 
via discontinuous transcription (Spaan etai, 1983; Lai 
et a!., 1984; Brown era/., 1984). Translation of each 
RNA is thought to be restricted to the open reading 
frames (ORFs) at the 5' end that are not present in the 
smaller RNAs (for review see Siddell et at ., 1983). 

In vitro translation of the viral mRNAs (Rottier et a!., 
1981; Siddell, 1983; Stern and Sefton, 1984; Jacobs 
et at., 1986; de Groot et a/., 1987a) and the sequence 
analysis of coronavirus genomes (Boursnell et ai, 
1987; Rasschaert era/., 1987; Armstrong eta!., 1984; 
Skinner and Siddell, 1985; Skinner et a!., 1985; 
Schmidt era/., 1987; Luytjes ef a/., 1987) have allowed 
the construction of genomic maps. The relative posi¬ 
tion of the genes encoding the structural proteins is 
conserved on the genomes of these viruses. However, 
differences in the genomic maps indicated that other 
transcription units have been lost, gained, or translo- 
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cated as the coronaviruses diverged (de Groot et a!., 
1987a). 

Feline infectious peritonitis virus (FIPV) and transmisr 
sible gastroenteritis virus (TGEV) of swine belong to the 
same antigenic cluster (Pedersen era/., 1978; Horzinek 
era/., 1982; Siddell era/., 1983) and are closely related: 
sequence analysis of their peplomer genes revealed up 
to 93% sequence identity (Jacobs era/., 1987). Despite 
this close relationship, TGEV and FIPV differ in their ge¬ 
nomic organization (de Groot et at., 1987a). 

TGEV is generally reported to specify six poly(A)-con- 
taining RNAs, the smallest of which (1.9 kb) encodes 
the nucleocapsid (N) protein (Jacobs era/., 1986). In 
contrastto othercoronaviruses, like infectious bronchi¬ 
tis virus (IBV) and mouse hepatitis virus (MHV), the nu¬ 
cleocapsid gene is not the 3'-most ORF. A short ORF 
(ORF-X3), potentially encoding a polypeptide of 9.1 K, 
is found further downstream (Kapke and Brian, 1986; 
Rasschaert ef a/., 1987). Although the presumptive 
transcription-initiation signal, AACTAAAC, is present 
at the 5'end of ORF-X3, it is not clear whether this ORF 
is carried on a separate mRNA (Jacobs et at., 1986; 
Kapke and Brian, 1986; Rasschaert ef a/., 1987). 

For FIPV an RNA of 2.8 kb (RNA 5) encodes the N 
protein, while the smallest RNA (RNA 6) has a length of 
about 1450 bp. These findings indicated the presence 
of a large insertion at the 3' end of the FIPV genome 
as compared to the TGEV genome. In this report we 
describe the cloning and sequence analysis of the 3' 
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end of the FIPV strain 79-1 146 genome; a detailed 
comparison with the 3' end of the TGEV genome is pre¬ 
sented. 

MATERIALS AND METHODS 
Selection and analysis of cDNA clones 

cDNA clones containing sequences derived from the 
3' end of the FIPV genome were selected from a "ran¬ 
dom" cDNA library of FIPV 79-1146 genomic RNA (de 
Groot et al., 1987b) by hybridization to sucrose-gradi¬ 
ent purified, ®^P-labeled FIPV RNA 6 in 50% formamide, 
5X SSC, 5X Denhardt's , 100 /ig/ml salmon sperm 
DNA at 42°. Recombinant DNA techniques were per¬ 
formed by standard methods (Maniatis et al., 1982). 
Sequence analysis was carried out using the dideoxy- 
nucleotide chain termination procedure (Sanger et al., 
1977). Sequence data were assembled and analyzed 
by using the computer programs by Staden (1986). SI 
nuclease analysis and Northern blot analysis were per¬ 
formed as described (de Groot era/., 1987a,b). 

RESULTS 

Isolation and characterization of recombinants 
containing sequences derived from the 3' end of the 
FIPV genome 

The preparation of a cDNA library of FIPV genomic 
RNA in pUC9 was described previously (de Groot era/., 
1987b). Recombinants containing sequences derived 
from the 3' end of the FIPV genome were isolated by 
colony hybridization with an RNA fraction enriched for 
RNA 6 (fraction 28, de Groot et al., 1987a). The plas¬ 
mids pBI 2, pCI 2, and pE7 were selected for sequence 
analysis. In Fig. 1 the sequence strategy is outlined. 
Three major ORFs were identified (Fig. 1). The 5'-most 
ORF could be identified as the 3' end of the FIPV nu- 
cleocapsid gene, the sequence of which was 64% 
identical to the corresponding TGEV sequence (not 
shown). Figure 2 shows the nucleotide sequence and 
the predicted amino acid sequences of the region 
downstream of the N gene. As shown in Fig. 1, this 
sequence was determined on both strands and on two 
independent cDNA clones, except for the 3'-most 68 
nucleotides. 

ORF-1 (positions 49 to 375) predicts a protein of 108 
residues. In the genomic sequence this ORF overlaps 
with the N gene. The first AUG codon (position 49) is 
followed by the sequence AACTAAAC; a second AUG 
codon is present at position 70 (Fig. 2). ORF-2 (posi¬ 
tions 380 to 1000) could encode a polypeptide of 206 
residues. 


Comparison with the 3' end of the TGEV genome 

Figure 3a shows a dot matrix comparison of the 3' 
end of the FIPV and TGEV genomes (Kapke and Brian, 
1986). The highest sequence similarity (98.5%) was 
found in the 3' noncoding regions. ORF-1 is 78% identi¬ 
cal to the TGEV ORF-X3 but contains 69 nucleotides in 
addition (indicated by a dashed line in Fig. 2). The AUG 
start codon of ORF-X3 corresponds to the second AUG 
codon of ORF-1. 

A 624-nucleotide segment positions 376-1000) im¬ 
mediately downstream of ORF-1 is absent in the TGEV 
sequence. Strikingly, this segment corresponds ex¬ 
actly to ORF-2. A schematic alignment oftheTGEV and 
FIPV sequences is shown in Fig. 3b. 

None of the recombinants isolated from our cDNA 
library contained the poly(A) tail, probably because the 
cDNA synthesis was randomly primed by calf thymus 
DNA pentamers (de Groot et al., 1987b). If aligned to 
the TGEV sequence, the most 3' located clone, E7, 
ends just one nucleotide upstream of the poly(A) tail. 

Localization of the 5' end of the presumptive RNA 
body of RNA 6 

To determine the 5' end of RNA 6, we used Si 
nuclease analysis. An Ml 3-recombinant phage con¬ 
taining the virus-sense strand of the 950-bp Pst\-Taq\ 
fragment (Fig. 1) served as a template to prepare a uni¬ 
formly labeled probe. This probe was hybridized to su¬ 
crose-gradient purified RNA 6, followed by Si nuclease 
digestion. A fragment of 514 nucleotides was pro¬ 
tected (Fig. 4a). The precise length was determined in 
a sequencing gel (Fig. 4b). This indicates that the 5' end 
of the RNA 6 body maps at position 60, immediately 
upstream of the AACTAAAC box. Consequently, the 
AUG codon at position 49 is not present in RNA 6. 
Furthermore, these results suggest that the body of 
RNA 6 has a length of 1212 nucleotides, provided that 
there are no additional insertions. Assuming an RNA 
leader sequence of 60-70 nucleotides (Spaan et al., 
1983; Lai ef a/., 1984; Brown eta/., 1984) and a poly(A) 
tail of about 100 nucleotides we arrive at a predicted 
length of approximately 1400 nucleotides. Previously, 
RNA 6 was estimated to be 1600 nucleotides (de Groot 
etal., 1987a). By using gels with a better resolution in 
this MW range, we have now measured a length of 
about 1450 nucleotides (not shown). 

Since the AACTAAAC box preceding ORF-1 appar¬ 
ently is used as,a signal for initiation of transcription, 
we expected this also to be the case for the AAC¬ 
TAAAC sequence preceding the TGEV ORF-X3. Figure 
5 shows that in a Northern blot of oligo (dT)-selected 
RNAs of TGEV-infected cells, an RNA of about 700 nu- 
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Fig. 1. Restriction endonuclease map and sequence strategy for the FIPV cDNA clones E7, B12, and C12. P, PstI; S, Sp/?l; T, TagI; and X, 
Xbal. Sau3A and Rsa I restriction fragments (not shown) were also used to complete the sequence. The upper panel presents a schematic 
diagram of the possible open reading frames obtained when translating the nucleotide sequence as virus-sense RNA. The location of the 
nucleocapsid gene (N), ORF-1, and ORF-2 are indicated. 


cleotides can be detected by cross-hybridization with 
the 1600-bp Pstl fragment of clone B12. 

DISCUSSION 

Differences in the mRNA sets of TGEV Purdue and 
FIPV 79-1146 indicated the presence of large inser¬ 
tions at the 3' end of the FIPV genome (de Groot etal., 
1987a). We have characterized these insertions by se¬ 
quence analysis of recombinants which had been iso¬ 
lated from a FlPV-specific cDNA library by hybridization 
with the smallest FIPV mRNA (RNA 6). 

RNA 6 carries two ORFs, encoding polypeptides of 
11K (ORF-1) and 22K (ORF-2). In the genomic se¬ 
quence. ORF-1 overlaps with the 3' end of the N gene. 
However, by SI nuclease analysis it was shown that 
only sequences downstream of the N gene are con¬ 
tained in mRNA 6. Consequently, in this RNA only the 
second AUG codon of ORF-1 is available for transla¬ 
tion-initiation. 

The 5' end of the body of RNA 6 was mapped imme¬ 
diately upstream of the sequence AACTAAAC, the pre¬ 
sumptive transcription-initiation signal in the FIP/TGE 
viruses. This consensus sequence is not present be¬ 
tween ORF-1 and ORF-2. Furthermore, there are no in¬ 
dications for an RNA smaller then RNA 6. Therefore, if 


both ORFs are to be translated, RNA 6 must function 
as a bicistronic mRNA. The start codon of ORF-1 and 
the two internal, out-of-frame AUGs at positions 86 and 
188 are in an unfavorable context for translation-initia¬ 
tion, while the AUG of ORF-2 ranks among the most 
efficient start codons. According to the scanning hy¬ 
pothesis (Kozak, 1986a,b) this arrangement would fa¬ 
vor translation of ORF-2. Translation of ORF-2 could 
also occur via reinitiation (Peabody and Berg, 1986; 
Peabody era/,, 1986). Coronavirus mRNAs containing 
two or three ORFs in the 5' “unique" segment have 
previously been described for MHV (Skinner et a!., 
1985) and IBV (Boursnell ef a/., 1985). Recently, Smith 
etal. (1987) provided evidence for/a vivo expression of 
a "downstream" ORF of IBV RNA D. 

ORF-1 is homologous to the TGEV ORF-X3 and pres¬ 
ent at the same location relative to the N gene. How¬ 
ever, due to an in-frame insertion in FIPV or to a dele¬ 
tion in TGEV, ORF-1 is 69 nucleotides longer. The 
predicted ORF-X3 product contains hydrophobic 
segments of about 25 residues at the N- and C-termi- 
nus; these segments are separated by a hydrophilic 
central region (Kapke and Brian, 1986). The 23-residue 
insert in the ORF-1 product enlarges this hydrophilic 
region. Moreover, due to a point mutation, ORF-1 con- 
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D E V T N ★ 
MRLRTKR0L 


ORF 1 


EAYTDVFDDTQVEMI 

_■ VFVHAVLVTALILLL 
TGGAGGCATACACAGATGTGTTTGATGACACACAGGTT^GATGATTGATGAGGTTACX aACrAAACG CATGCTOGTTITa^^ 

50 A 100 


L D S F K N- 

IGRIQLLERLLLSHLL®LTTVSNVLGVPDSSLRVNCLQLL 

ATTGGTAGAATOIAATrACrAGAAAGGrTGTrACrcAGTCMCIOTTAATCITRCAACAGrCAGrAATGTmAGGI^^ 

170 220 


- Y R S K C R 

KPDCLDFNILHKVLAETRLLVVVLRVIFLVLLGFSCYTLL 

AAACCAGACTGCCITCAmTAATATCmCATAAAGirrTAGCAGAAACCAGGrrACrAGrAGTAGrACim^AGTGA 

290 340 


ORF 2- 


V T V 

GALF^ MIVVILVCIPLANGIKATAVQNDLHEHPVLTWDL 
GGTGGATTATTTTAACATCATCATTGITGrAATCCITGTGTGTATCITTITCGCTAATGGAATrAAAGCrACIGCIXG^^ 

410 460 


LQHFIGHTLYITTHQVLALPLGSRVECEGIEGFNCTWPGF 

ATTACAGCATTTGATAGGACATAaXnCTACAmCAACACACCAGGiaTAGCACrAOaXriTGGATCiaTIG^^ 

530 580 


QDPAHDHIDFYFDLSNPFYSFVDNFYIVSEGNQRINLRLV 

TCAAGATCCIGCACATGATCATATTGATTTCTACITIGATCITrcrAATCCITICTATICAmCTAGATAAIT^ 

650 700 


GAVPKQKRLNVGCHTSFAVDLPFGIQIYHDRDFQHPVDGR 

TGGIGCTGIGCCAAAACAAAAGAGATTAAATGlTGGTTGTCATACATCATITGCKJITGATCTrcCATITGGGATTCAGA^ 

770 820 


HLDCTHRVYFVKYCPHNLHGYCFNERLKVYDLKQFRSKKV 

ACATCrAGAlT?GTACICACAGAGTGrACTITCTCAAGrACrGTCCAa^TAACX7IGCATQGTrATTC(nTrAATGAGAGGC^^ 

890 940 


FDKINQHHKTEL^ 

CITCGACAAAATCAACCAACATCATAAAACIGAGTTATAAGGCAAaoaGATGTCTAAAACIGGTCITrcOGAGGAATrACGGGTCAT^^ 

1010 1060 


CGTGTAATAGGAGGTACAAGCAACCXJrAITGCATATrAGGAACTnTAGATITGATITGGCAATCCTAGATITAGrAATITAGAGAAGm 

1130 1180 


AGAGCTAAOGTCIGGATCTAGlX^ATOGmAAAATCrAAAATIGTrTCAAAATITrcCTIT^ 

1250 

Fig. 2. The nucleotide sequence and the deduced amino acid sequences for the 3' end of the FIPV strain 79-1146 genome. Only the extreme 
3' end of the nucleocapsid gene (N) is presented. The presumptive transcription-initiation signal is underlined. An arrow indicates the 5' end of 
the “body” of RNA 6 as determined by SI analysis. The presumptive initiating methionine of ORF-1 is boxed. Stop codons are indicated by 
asterisks. Differences in the amino acid sequences of ORF-1 and TGEV ORF-X3 (Kapke and Brian, 1987) are also shown; the 69 nucleotide 
segment, which is absent in TGEV ORF-X3, is indicated by a dashed line. Base 376 to base 1000 are also not present in the TGEV sequence. 
The potential N-glycosylation site in ORF-1 is indicated by encircling the asparagine residue. 


tains a potential N-glycosylation site (Fig. 2) which is 
absent in the ORF-X3 sequence as determined by 
Kapke and Brian (1986). The ORF-X3 sequences deter¬ 


mined by Rasschaert et at. (1987) and Britton et al. 
(1988) contain a potential glycosyiation site located 
four amino acid residues upstream of the ORF-1 glyco- 
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Fig. 3. (a) Dot matrix comparison of the nucleotide sequence of the 3' ends from FIPV strain 79-1146 and TGEV strain Purdue (Kapke and 
Brian, 1986). A schematic presentation of the results is depicted In (b). The ORFs are indicated by bars, black bars represent conserved seg¬ 
ments, insertions are depicted by white bars. The arrow indicates the 5'end of the 'body' of RNA 6. 


sylation site. The features of the ORF-1 and ORF-X3 
products are characteristic for membrane proteins. 
ORF-1 and ORF-X3 may encode minor structural pro¬ 
teins which have not yet been detected because of 
their small size. 

Like ORF-1, TGEV ORF-X3 is preceded by the pre¬ 
sumptive transcription-initiation signal AACTAAAC 
(Kapke and Brian, 1986). Flowever, it was unclear 
whether in TGEV this ORF is contained in a separate 
RNA (Kapke and Brian, 1986; Jacobs et al., 1986; Ras- 
schaert et al., 1987). Rasschaert et at. (1987) did not 
detect such RNA species in Northern blots, but could 
have missed it since low percentage agarose gels were 


used. By cross-hybridization with an FIPV 3'end probe, 
we detected a TGEV-specific, poly(A)-containing RNA 
species of about 700 nucleotides in Northern blots. Ja¬ 
cobs et al. (1986) previously observed this RNA spe¬ 
cies in TGEV-infected cells, but considered it host spe¬ 
cific. Flowever, its synthesis was not affected by ac- 
tinomycin D (Jacobs etai, 1986), which is a strong indi¬ 
cation for virus specificity. 

ORF-2 is located on a 624-nucleotide segment, 
which is absent at the 3' end of the TGEV genome. A 
comparison with the partial TGEV sequence deter¬ 
mined by Rasschaert et at. (1987) showed that the 
TGEV genome does not contain an ORF-2 homolog 
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downstream of the peplomer gene. Moreover, ORF-2- 
specific probes did not cross-hybridize with the TGEV 
mRNA 1 (not shown), indicating the absence of se¬ 
quences related to ORF-2 in the remaining part of the 
TGEV genome. ORF-2 is not related to genes of mouse 
hepatitis virus (MFIV), infectious bronchitis virus (IBV) or 
to any of the sequences in the NBRF and Swiss protein 
databases. The predicted ORF-2 product is predomi¬ 
nantly hydrophilic but contains a hydrophobic segment 
of 12 residues at the N-terminus. 

ORF-2 and the 69 nucleotide segment in ORF-1 may 
have been deleted in TGEV, but an intriguing possibility 
is that FIPV has acquired these sequences by RNA re¬ 
combination. Flomologous recombination in vitro has 
been described for the MFIV strains A59 and JFIM (Ma- 
kino et ai, 1986). The remarkable sequence diver¬ 
gence at the 5' ends of the peplomer genes of TGEV 
and FIPV suggests that similar events may also occur 
in vivo (Jacobs et ai., 1987). Recently, Luytjes et ai. 
(1988) discovered a striking sequence similarity be¬ 
tween a pseudogene contained in RNA 2 of MHV A59 
and the hemagglutinin (HA) gene of influenza virus, 
type C. This finding is best explained by a nonhomolo- 
gous recombination event. Conceivably, uptake of ge¬ 
netic information via nonhomologous recombination 
may account for FIPV ORF-2 and the various nonre- 
lated ORFs in other coronaviruses. 


m 1 2 3 

(bp) 
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Fig. 4. (a) S1 nuclease mapping of the 5' end of the FIPV RNA 
6. The uniformly labeled, anti-sense strand of the 950 bp Pst\-Taq\ 
fragment was used as a probe. After SI nuclease digestion the pro¬ 
tected fragments were analyzed in a 2% agarose gel. Lane 1, hybrid¬ 
ization to total poly(A)-containing RNA extracted from FlPV-infected 
cells; lane 2, hybridization to yeast tRNA; lane 3, hybridization to su¬ 
crose-gradient purified RNA 6 (fraction 28; de Groot et ai., 1987a). 
SauSA digested pUC 8 was used as a molecular weight marker (lane 
m). (b) the precise length of the protected fragment was determined 
in a sequencing gel. Lane 1, marker; lane 2, protected fragment after 
hybridization to purified RNA 6. 


FIPV TG£V 

(Kb) 


52 - 
3S-9 
2a-|| I A 



145- p 


Fig. 5. Northern blot analysis of total poly{A)-containing RNA iso¬ 
lated from FIPV and TGEV-infeoted cells. The RNAs were separated 
in a 1.5% agarose-formaldehyde gel. blotted to a nylon membrane, 
and cross-hybridized to the FIPV 1600-bp Psfl fragment of plasmid 
B12. The molecular weights of the FIPV RNAs are presented. An 
arrow indicates the 700-nucleotide RNA species ofTGEV. 
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